A Transcription Task for Crowdsourcing with Automatic Quality Control

نویسندگان

  • Chia-ying Lee
  • James R. Glass
چکیده

In this paper, we propose a two-stage transcription task design for crowdsourcing with an automatic quality control mechanism embedded in each stage. For the first stage, a support vector machine (SVM) classifier is utilized to quickly filter poor quality transcripts based on acoustic cues and language patterns in the transcript. In the second stage, word level confidence scores are used to estimate a transcription quality and provide instantaneous feedback to the transcriber. The proposed design was evaluated using Amazon Mechanical Turk (MTurk) and tested on seven hours of academic lecture speech, which is typically conversational in nature and contains technical material. Compared to baseline transcripts which were also collected from MTurk using a ROVER-based method, we observed that the new method resulted in higher quality transcripts while requiring less transcriber effort.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Best Practices for Crowdsourcing Dialectal Arabic Speech Transcription

In this paper, we investigate different approaches in crowdsourcing transcriptions of Dialectal Arabic speech with automatic quality control to ensure good transcription at the source. Since Dialectal Arabic has no standard orthographic representation, it is very challenging to perform quality control. We propose a complete recipe for speech transcription quality control that includes using out...

متن کامل

Using Crowdsourcing for Multi-label Biomedical Compound Figure Annotation

Information analysis or retrieval for images in the biomedical literature needs to deal with a large amount of compound figures (figures containing several subfigures), as they constitute probably more than half of all images in repositories such as PubMed Central, which was the data set used for the task. The ImageCLEFmed benchmark proposed among other tasks in 2015 and 2016 a multi-label clas...

متن کامل

POMDP-based control of workflows for crowdsourcing

a r t i c l e i n f o a b s t r a c t Crowdsourcing, outsourcing of tasks to a crowd of unknown people (" workers ") in an open call, is rapidly rising in popularity. It is already being heavily used by numerous employers (" requesters ") for solving a wide variety of tasks, such as audio transcription, content screening, and labeling training data for machine learning. However, quality control...

متن کامل

Selection and aggregation techniques for crowdsourced semantic annotation task

Crowdsourcing is an accessible and cost-effective alternative to traditional methods of collecting and annotating data. The application of crowdsourcing to simple tasks has been well investigated. However, complex tasks like semantic annotation transfer require workers to take simultaneous decisions on chunk segmentation and labeling while acquiring on-the-go domainspecific knowledge. The incre...

متن کامل

Perform Three Data Mining Tasks with Crowdsourcing Process

For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011